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Abstract 

The rise of systems biology is intertwined with that of genomics, yet their primordial relationship to one another is 
ill-defined. We discuss how the growth of genomics provided a critical boost to the popularity of systems biology. 
We describe the parts of genomics that share common areas of interest with systems biology today in the areas of 
gene expression, network inference, chromatin state analysis, pathway analysis, personalized medicine, and 
upcoming areas of synergy as genomics continues to expand its scope across all biomedical fields. 



The modern history of biological and medical sciences 
can be summarized in three words: ever-increasing spe- 
cialization. As biologists have methodically surveyed the 
characteristics of living systems at the multiple scales of 
molecular, cellular, organismal, and ecological organiza- 
tion, seminal discoveries and technical advances have 
spawned entirely new fields of research that quickly 
develop their own themes, vocabularies, and research 
culture. Whether intended or not, this specialization 
usually involves reducing the scope of the problem to 
focus research and ease the burden of tracking too 
many variables and concepts. This reductionist approach 
is widespread to all sciences and no biologist or other 
biomedical scientist can claim deep knowledge of the 
state-of-the-art in more than a few of the hundreds sub- 
fields of biology. Running counter to this natural balka- 
nization are the handful of organizing principles that 
span across all of biology such as natural selection and 
the central dogma of molecular biology. Why would 
anyone look for an alternative to reductionist biology 
after one hundred fifty years of unrelenting success? 

Several subfields of biology have discovered indepen- 
dently that detailed studies of the structure and function of 
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individual parts in detail did not give them a good under- 
standing of the emergent properties of the interactions of 
many parts within a whole system. Neuroscience, physiol- 
ogy, and ecology all converged independently on the idea 
that it was as important to study and model the system of 
parts and their interactions as to fully analyze the 
individual parts alone. While system-centric studies in 
specific subfields of biology can be traced back to the 
1950s and 1960s, it was not until the mid-1990s that sys- 
tems biology developed into a major counter-movement 
that would grow to challenge the reductionist approach. 
A common definition of systems biology is the study of a 
given biological system by (a) the perturbation of a 
property of that system, (b) the measurement of resulting 
gene, protein, and pathway responses, (c) the integration of 
these data, and (d) the ultimate modeling of these data to 
describe the system as well as its response to perturbation 
[1]. We refer the reader to a review of the common ground 
of computational neuroscience with systems biology for a 
brief historical overview of the emergence of modern 
systems biology from pre-existing biological fields [2], 
Fundamentally, the essence of systems biology is the study 
of interactions between parts of the system using experi- 
mental and computational methods. 

The tremendous growth of interest in systems biology 
was driven by the simultaneous rise of genomics [1,2], 
which is the field dedicated to the large-scale analysis of 
the properties of genomes. As the international human 
genome sequencing project ramped up, the scientific 
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community in concert with funding agencies devoted 
increasing efforts to the development of computational 
methods for genome assembly, annotation, and analysis. 
The concurrent development of microarrays as the first 
platform for large-scale gene expression measurements led 
to the birth of the new field of functional genomics, which 
quickly expanded to include other biomolecules, namely 
proteomics and metabolomics for the measurements of 
protein levels and metabolic intermediates respectively. 
A hallmark of this functional data is that they represent 
global measurements of thousands of molecular features 
where no one feature has an a priori higher importance 
than others. In this review we consider genomics in the 
broadest sense to include both structural and functional 
genome-wide measurements. In the context of transcrip- 
tomics, the mapping of transcripts onto exons on the gen- 
ome is a structural measurement, whereas the expression 
levels of transcripts are functional measurements. The 
transformation of biology by genomics from a relatively 
data-poor into a data-intensive field has motivated the 
development of novel computational, machine-learning 
and other quantitative methods for genomic analysis that 
attracted a large number of engineers, physicists, 
and mathematicians into biology. As gene expression 
and other functional data have accumulated through ever- 
larger scale projects such as ENCODE [3], significant 
efforts have been invested in integratively analyzing data 
to build gene regulatory networks [4,5]. Most models built 
from high-throughput genomics data tend to be correla- 
tive with relatively limited predictive power. This version 
of systems biology that emphasizes parts (a)-(c) of the defi- 
nition above is barely recognizable to other biologists who 
associate systems biology with a more mathematical mod- 
eling driven approach that attempts to explain biological 
phenomena of a system with a limited number of parts 
using differential equations, which emphasizes part (d) of 
the definition. This difference of opinion leads to passio- 
nate discussions of whether genomic analyses qualify as 
systems biology or do not. We believe that, in the broadest 
sense, many parts of genomics do fall within the purview 
of systems biology. We do not attempt to give here an 
exhaustive review of genomics or systems biology because 
of the vast literature of each field. Instead, this review 
delineates explicit areas of overlap between genomics and 
systems biology related to transcriptomics, metabolomics, 
and gene regulatory network inference as well as outlines 
some of the genomic challenges that will likely drive the 
field forward. 

Defining the overlap between genomics and systems 
biology 

Why not consider all of genomics or at least all of bioin- 
formatics to be part of systems biology? After all, most 



genomic experiments generate thousands to billions of 
data points that require quantitative, bioinformatic meth- 
ods for analysis. However, there are bioinformatic tasks 
that provide little direct insight into a system without 
further analysis. For example, the mapping of reads and 
the assembly of genomes are two critical, foundational 
activities of genomics that pose some of the greatest 
algorithmic challenges and are very active areas of 
research. Yet the resulting assembly or the location of 
reads onto a genome are not informative on their own 
about the system, but require further analysis with addi- 
tional tools. We can use the operational definition of sys- 
tems biology as the study of interactions between parts of 
the system to identify areas of genomics that are clearly 
systems-centric and others that are more dependent on 
the goals of the experiment. For example, while sequen- 
cing a transcriptome solely for discovery of novel tran- 
scripts would not fall within the realm of systems biology, 
the analysis of the change of gene expression in existing 
and newly-discovered transcripts during a developmental 
time course or after a perturbation such as an siRNA 
certainly would qualify. Similarly, the identification of 
SNPs in an individual genome would not qualify as sys- 
tems biology, but the quantification of their effect on the 
expression of associated genes and the identification of 
gene expression Quantitative Trait Loci (eQTL) [6,7] 
definitely does. The sequencing of cancer genomes to 
identify the mutations driving the cancer represents a 
third such systems approach of how changes in one part 
of the system affects the behavior of the whole system. 

We can broadly describe the many subfields of genomics 
as falling under three over-arching categories based on 
their relationship with the genome under study on a 
continuum of "pure" to "applied" genomics: global, 
general, and specific (Figure 1). Sub-fields of genomics 
that focus on a global view are those that provide a single 
answer that is essentially identical for all individuals from 
that species. This would include the reference genome 
assembly and annotation as well as comparative genomics 
(Figure 1). While algorithmically challenging, these 
analyses do not shed light on the behavior of the system 
per-se, but represent more a catalog of the parts, which 
can later serve as a starting point for systems-level ana- 
lyses. The second category encompasses fields of genomics 
that are interested in analyzing specific aspects of the gen- 
ome "in action" such as transcriptome discovery in a 
specific cell-type, or tissue, or the analysis of the encoding 
of the transcriptional logic of gene regulatory networks 
underlying development. These problems can be generally 
reframed within a systems framework to get insights into 
their function and behavior. For example, the study of the 
dynamics of gene regulatory networks represents one of 
the primary problems in the field of transcriptional 
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regulation that is also a classical example of systems biol- 
ogy [1]. Finally, there are a multitude of genomic experi- 
ments that are specific to the factor, or cell type, or 
individual under study (Figure 1). This includes examples 
such as transcription factor interactomes measured using 
ChlP-seq [8], transcriptome quantitation using RNA-seq 
[9], and genome variation analysis in individual genomes 
[6,7] . A characteristic of these problems is that they typi- 
cally require associating called peaks, expression levels, or 
variants to specific genes and inferring functional enrich- 
ment in pathways using tools such as pathway analysis 
[10,11] and Gene Ontology [12], which fall under our defi- 
nition of systems biology. In our post-Sanger-sequencing 
world, another characteristic of genomic problems suited 
for systems biology is that their starting data typically 
comes in the form of millions of data points such as short 
reads that enable statistically analyzable counting assays. 
While global problems such as genome assembly will 
always benefit from ever longer reads, counting assays 
benefit primarily from additional reads rather than longer 
read length (Figure 1), as discussed in a separate review 
[13]. We now turn to a more detailed analysis of individual 
genomic fields and their relationship to systems biology 
(Figure 2). 

Gene expression 

Whereas the elements that make up the genetic defini- 
tion of living organisms are encoded into the genome, it 
is the ensemble of expressed genes that are the actual 
manifestation of the biological system. Regardless of 
how gene expression is regulated, expressed transcripts 
are prerequisite, primary components of cell physiology. 
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Figure 1 The continuum of genomics problems. Representative 
areas of genomics are arranged along a continuum of pure to 
more applied genomic research problems that can be grouped into 
three overall categories of global, general, and specific problems 
with respect to the genome under study. In this schema, genomic 
problems that can benefit from a systems biology approach 
generally fall under the general and specific columns and typically 
rely on counting assays that leverage the large number of reads or 
datapoints generated by modern high-throughput platforms. 



The development of genome-wide gene expression profil- 
ing technologies, using microarrays first and sequencing 
since, has brought the analysis of gene expression into the 
realm of systems biology. There exist a variety of techni- 
ques such as arrays, SAGE, CAGE, and RNA-seq that 
allow for different combinations of quantitative (transcript 
expression levels and differential expression) and/or dis- 
covery (splicing events and transcript intron/exon organi- 
zation) analyses of transcriptomes. While the 
measurement of expression genome-wide is only the first 
step in deriving system-level knowledge, it presents analy- 
tical challenges to this day. The main reason is that geno- 
mic experimental techniques measure individual parts of 
the system in parallel but cannot directly measure the sys- 
tem structure, which needs to be inferred. This inference 
is complicated by the high variable to observation ratio of 
genomics, which causes the intrinsic and heavily underde- 
termined nature of the genomics/systems biology mar- 
riage. Experimental designs involving time courses and or 
perturbation can provide significantly more, but rarely 
enough, information on the underlying system structure. 
Computational biologists address this high underdetermi- 
nation problem using strategies such as variable selection 
[14-17], model constraints by additional data [18] or 
exhaustive search of the results space to reach local opti- 
mal solutions [19-21]. However, there are still methodolo- 
gical and conceptual limitations that must be overcome to 
bridge the gap between simple gene expression analysis 
and the inference of molecular systems. For example, 
most of the popular differential gene expression methods 
that are used for variable selection provide single 
gene-based assessments of differential expression [22-25]. 
While these methods incorporate parameterizations to 
account for the high dimensional nature of genomics data, 
such as pooled variance estimates [22] or multiple testing 
correction [26], they completely ignore the interactions of 
genes as parts of large-scale biological pathways and sys- 
tems [27] . A first step in this direction is the application of 
multivariate methodologies to transcriptome analysis that 
exploits the covariance structure of the expression data 
matrix to infer patterns of gene expression and select 
genes for their relevance in those patterns [28,29]. The 
underlying hypothesis here is that covariance is a proxy of 
co-expression and that relevant processes (and genes) of 
the system can be identified for their co-expressing char- 
acteristic. Approaches that come closer to a systems dri- 
ven analysis of differential expression have used gene 
network data to guide the multivariate analysis under the 
assumption that genes for which an interaction exist 
are correlated in their differential expression states 
[30,31] or have taken an Empirical Bayes approach by 
modeling networks as a Markov random field (MRF) to 
identify genes and sub-networks that are related to 
diseases [32,33]. 
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Figure 2 Relationships between systems biology and genomics. Functional genomics assays like gene expression profiling, metabolomics 
and proteomics are used as input data by different systems-level analysis approaches such as Gene Regulatory Network (GRN) inference, 
Pathway Analysis (PA) and Flux Balance Analysis (FBA). Functional annotation, a core activity of genomics, is a prerequisite in PA and FBA, and 
helps in the interpretation of GRNs. GRN and FBA generate models of the biological system based on genomics data and can also use pathway 
databases as a priori information to help building models of the system. Alternatively, PA can be directly employed as an interpretative tool of 
the system. Systems Medicine relies on GRNs and pathways to develop personalized genomic diagnosis tools. Metagenomics expands the 
system under study to a supraorganismal level, whereas novel systems-level annotation paradigms such as transcript annotation expand the 
scope of functional annotations. 



Network inference as the common core of genomics and 
systems biology 

When considering the interplay between systems biology 
and genomics, two major paradigms stand out: one is 
the use of gene expression measurements to obtain the 
structure of the system and infer Gene Regulatory Net- 
works (GRNs), while the other is the leveraging of sys- 
tem properties to interpret observed gene expression 
patterns using pathway enrichment methods (Figure 2). 

The systems biology of gene expression is frequently 
understood as a problem of gene regulatory network 
inference, where gene networks capture how the expres- 
sion profile of individual genes interacts with each other 
[1]. The encoding and dynamics of transcriptional regula- 
tion have fascinated scientists ever since the seminal work 
of Jacob and Monod on the lac operon in E. coli [34]. The 
last fifty years have clearly shown that the transcriptional 
regulation is encoded in GRNs that robustly control spa- 
tiotemporal expression of genes to enable proper develop- 
ment and function from the simplest bacteria to the most 
complex animals and plants [35] . Popular systems-based 
approaches such as Weighted Gene Correlation Network 
Analysis (WGCNA) have been applied successfully to a 



variety of biological settings to identify modules of tightly 
co-expressed genes in cases such as single-cell human and 
mouse RNA-seq time-courses of early pre-implantation 
embryonic development [36]. Recent efforts to build gen- 
ome-wide GRNs from transcription factor ChlP-seq [4] 
and DNase-hypersensitivity data [5] from multiple cell 
types have heavily focused on the regulation of the regula- 
tors such as transcription factors and other signaling- 
related genes. These regulatory proteins form the core of 
GRNs with complex, intertwined feedback loops between 
regulators at the transcriptional and often post-transcrip- 
tional level. Systems-based techniques play a key role in 
the analysis of GRNs as it is nearly impossible to under- 
stand the behavior of a moderately complex GRN that 
incorporates feedback loops without modeling. As it 
becomes increasingly practical to map the regulatory 
linkages in GRNs from large-scale functional sequencing 
data, the challenge of modeling and predicting the 
dynamics of GRNs becomes ever more pressing. 

The rapid increase in the amount of multiple, comple- 
mentary chromatin-related data in the same sample 
such as ChlP-seq of different histone modifications and 
transcription factors as well as DNase-seq has led to the 
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development of a new set of tools to analyze the data 
integratively in order to learn more about the global orga- 
nization of the genome. Two very different approaches 
have been used to analyze such data, which is typically 
preprocessed into genomic segments with boundaries 
derived from the data signal. The first approach uses the 
chromatin data on the segments for training of Hidden 
Markov Models [37] or Dynamic Bayesian Networks [38] 
to learn the smallest number of states that can recapitulate 
the major processes of transcriptional regulation and effec- 
tively annotate the genome de novo. The hidden states 
from these models are learned from the data and their 
chromatin signatures are interpreted post-training to 
associate particular states with promoters, enhancers, 
transcribed regions, or repressed regions. While there is 
no a priori defined number of states that we can safely 
expect in the genome, smaller numbers of states (prefer- 
ably less than 20) are often preferred for the sake of inter- 
pretability. However, the combinatorial nature of gene 
regulation points to another extreme, where we are inter- 
ested in identifying relatively small cohorts of genomic 
regions that show similar coordinated changes of chroma- 
tin marks and transcription factor binding across many 
data sets and multiple cell types. In such cases, we would 
like to interrogate the genome with a much larger number 
of potential micro-states and then apply some form of 
dimension reduction to identify related micro-states that 
form larger coherent groups of "meta-states". A Self-Orga- 
nizing Map (SOM) is another unsupervised machine 
learning clustering technique that has been used in two 
recent publications to analyze a large number of ChlP-seq 
(and DNase-seq) datasets using maps with potentially at 
least a thousand such micro-states [39,40]. The maps con- 
sist of thousands of units (or "neurons") that are arranged 
in a two dimensional grid. In order to avoid boundary 
effects, the maps are often laid on the surface of a toroid 
that can be unwrapped for visualization. Each unit of the 
map has an associated vector that is originally initialized 
randomly. The map is trained using the vectorized signal 
from the datasets (either binarized [40] or using RPKM 
signal density [39]) for each segment until the map con- 
verges. Every segment is then assigned to the best match- 
ing unit on the map. The resulting map is mined for 
relationships between training dataset enrichments in spe- 
cific units and can be interpreted further by laying addi- 
tional data on the map not used during the training. These 
maps typically reveal very distinct colocalization patterns 
between particular datasets in specific cell-types. While 
the results from the hidden-state-based or SOM-based 
approaches are global, they can both be mined to identify 
the actual, underlying regulatory elements encoding 
the GRNs and will presumably be used for further 
automated attempts to derive networks from functional 
sequencing data. 



However the concept of molecular networks extends 
beyond gene regulatory networks. In fact, much of the 
early research in systems biology focused on flux 
balance analysis (FBA), which is a genome-wide analysis 
of metabolic regulation [41,42]. FBA relies on simple 
stoichiometry rather than difficult to measure enzyme 
kinetics to analyze the behavior of metabolomics net- 
works. FBA employs a linear programming (LP) strategy 
to generate a flux distribution that is optimized toward 
a particular 'objective', normally maximal cell growth, 
subject to a set of underlying physicochemical and 
thermodynamic constraints fitting experimental data on 
changes at nutritional and metabolic levels [43]. FBA 
can be integratively analyzed with genome-wide data by 
incorporating gene expression measurements into meta- 
bolic modeling (Figure 2). This combination enables the 
characterization of the regulatory modalities governing 
metabolism and for the identification of metabolic hubs 
[44-46]. For example, an analysis of yeast strains grown 
in different nutritional conditions combined Z-scores of 
metabolic fluxes obtained by either metabolic or gene 
expression measurements to classify the regulation level 
of metabolic circuits as transcriptionally, post-transcrip- 
tionally, or metabolically controlled [46]. In another 
study, FBA and gene expression were combined to pre- 
dict the impact of 75 different drugs, drug combinations, 
and nutrient conditions on mycolic acid biosynthesis 
capacity in M. tuberculosis, using a public compendium 
of over 400 expression arrays [44]. The authors showed 
that e-Flux (expression and flux) analysis can be used to 
correctly predict the modulators of metabolite biosynth- 
esis and the metabolic state under specific nutritional or 
treatment conditions. 

The combination of metabolic modeling and gene 
expression analysis is not only relevant for drug target 
discovery, but is also of major importance for targeted 
metabolic engineering and synthesis of economically 
relevant compounds, energy production or waste treat- 
ment [47,48]. This economic potential, together with the 
development of cost-effective sequencing technologies, 
has boosted the sequencing of the genomes of novel 
microorganisms for biotechnology applications. A key 
element of the success of these approaches is the avail- 
ability of efficient genome and annotation algorithms 
that characterize the metabolic potential of the newly 
sequenced genomes (Figure 2). Reference functional 
databases such as KEGG [49] and AraCyc [50] are fre- 
quently used as the backbone for metabolic reconstruc- 
tion, which needs to be further complemented by 
algorithms that build the genome-wide metabolic net- 
work, fill in reaction gaps and validate predictions [51]. 
In this sense genome (functional) annotation, a core 
activity of genomics, is a necessary prerequisite for the 
computation-based reconstruction of the metabolome of 
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novel species and hence serves as a substrate for sys- 
tems-level analyses of genomic data (Figure 2). 

Pathway analysis as the interpretative tool of systems 
biology 

The inference of gene and molecular networks is 
focused on mapping the mechanistic and structural 
properties of the system. Genome-wide gene network 
analyses typically produce large networks that involve 
hundreds of gene interactions. Such networks might 
have interesting topological properties that are biologi- 
cally meaningful, but are normally difficult to interpret 
in terms of cellular functionality. Functional enrichment 
analysis methods (also referred to as pathway or gene 
set enrichment) are methodologies that allow us to ana- 
lyze gene expression data for the biological meaning of 
particular expression patterns in order to gain additional 
insight into the actual biology of the system [52-54] 
(Figure 2). These functional assessment methodologies 
rely on the premise that the expressed components of 
cellular systems are likely to be functionally coordinated 
and that genes belonging to the same functional unit 
should show similar expression profiles. The first func- 
tional enrichment analysis methods identified pathways 
that were overrepresented within a list of differentially 
expressed genes [52] and were rapidly followed by the 
gene set enrichment approach [53] where a ranked, 
rather than a selected, list of genes was used to find 
associations between phenotypes and cellular functions. 
There are now a multitude of implementations based on 
this concept that introduce additional functional data 
such as protein interaction data [55,56], gene regulatory 
networks [57], pathway topology information [58], meta- 
bolic changes [59,60] or expression kinetics [61]. These 
methods have been applied not only to understand gene 
expression changes but also in Genome-Wide Associa- 
tion Studies (GWAS) [62], comparative genomics [63] 
and gene prioritization [64]. 

Personalized medicine and other upcoming challenges 

The rapid availability of ubiquitous sequencing holds 
great promise for medicine to the extent that genomics 
empowers the analysis of patient genomes to guide per- 
sonalized treatment. While we can now sequence an 
individual's genome and transcriptomes, it remains 
extremely difficult to use that data to inform treatment. 
"We currently lack the capability to evaluate the impact 
of most sequence variants found and what their func- 
tional consequences are. A decade of GWAS studies 
have revealed a multitude of common variants asso- 
ciated with various traits and diseases, each of which 
seems to contribute to or at least to increase the prob- 
ability of a phenotype by a small amount [65]. Yet most 
of these variants are in non-coding regions and we are 



often not even certain of the association of the variant 
with an actual gene. Parallel efforts by projects such as 
ENCODE [3] to annotate the functional parts of the 
genome have highlighted the functional complexities of 
the genome beyond coding sequences. A recent study 
found that 76% of non-coding GWAS SNPs associated 
with various phenotypes or diseases are found within or 
in perfect linkage disequilibrium with DNase hypersensi- 
tive sites called within ENCODE and the NIH Epige- 
nomic Roadmap Project, which suggests that they are 
associated with functional regulatory elements [66]. 
While this is highly encouraging, we are still unable to 
assess the contribution of these changes in functional 
elements to what are often complex phenotypes that 
arise from these combinatorial interactions between 
multiple variants occurring jointly at genes, let alone 
their interactions with the environment. Adopting meth- 
ods from systems biology to marshall the data into 
tractable, predictive models can shed light on the contri- 
butions of these individual variants to the phenotypes 
under study. For example, an interesting application of 
system biology to personalized medicine was the appli- 
cation of a flux balance analysis (mostly used in prokar- 
yote metabolic reconstruction) for modeling the 
metabolism of a single Hereditary Hemorrhagic Telan- 
giectasia patient to identify altered metabolic fluxes and 
to devise a personalized treatment that eventually 
improved patient condition [67] (Figure 2). 

A particularly compelling set of use cases for the applica- 
tion of systems biology to understand the genomics of 
disease can be found in cancer. Efforts to characterize the 
most prevalent mutations of various cancers by The Cancer 
Genome Atlas (TCGA) Research Network have revealed 
recurrent mutations in specific pathways. The recent 
TCGA Lung Squamous Cell Carcinoma (LSQCC) study 
[68] represents a particularly nice example of what can be 
accomplished by combining genome and transcriptome 
sequencing with systems-level pathway analysis. The 
sequencing of 178 patient samples found that in addition 
to universal mutations to TP53, each cancer carried 
higher-order combinations of multiple reoccurring muta- 
tions. LSQCC was divided into four subtypes based on a 
combination of expression, copy number variation, and 
methylation. Most promisingly, the authors found recurring 
mutations within targetable oncogenic pathways such as PI 
(3) kinase, RAS, and Receptor Tyrosine Kinases. While it is 
rare that cataloguing mutations in cancer alone will reveal 
both mechanisms of disease progression and potential 
drugable targets, we are left with the greater challenge 
of understanding how some cancers can relapse after 
treatment. One possible solution is the use of network con- 
cepts to identify groups of genes that when perturbed give 
the same phenotype and hence form a disease module [69]. 
If systems biology can rise to the challenge of predicting 
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the mutations that are most likely to allow a cancer to 
relapse, we may be able to design multi-drug treatments 
that will prevent cancers from evading conventional drug 
treatments. More generally, systems biology holds the pro- 
mise of helping to decrease the time and costs of develop- 
ing new drugs and also helping to provide more targeted 
and safe candidate drugs by leveraging pathway analysis. At 
the same time, the new field of pharmacogenomics seeks to 
understand the interactions between drugs and individuals' 
genotype. For example about 14% of the population carry 
the *2 allele of the cytochrome P450 CYP2C19 that pre- 
vents the proper processing of the anti-clotting drug clopi- 
dogrel (Plavix) and thus renders the drug ineffective [70]. 
Just as in the case of cancer genomics, we need to use sys- 
tems biology approaches if we are to capitalize on the 
patient's genome to identify how variants interact with 
drugs and predict what the ultimate effectiveness of these 
drugs might be in a specific patient rather than averaged 
over the whole population. 

Another area of great promise for mutual reinforce- 
ment between systems biology and genomics is in the 
study of the composition and interactions of bacterial 
communities with their environment (Figure 2). A vari- 
ety of sequencing projects have revealed that large num- 
bers of uncharacterized microbial species cooperatively 
interact in the environment in every imaginable ecologi- 
cal niche. This includes the microbial communities that 
are associated with specific human body niches and are 
characteristic of several human conditions such as 
Inflammatory Bowel Disease (IBD) and obesity [71]. 
While we are accumulating large metagenomic datasets 
and cataloguing bacterial genomes that make up the dif- 
ferent parts of the human microbiome in normal and 
diseased individuals, it is still very difficult to connect 
the presence or change in frequency of specific bacterial 
species with the associated phenotypes. An early exam- 
ple of metagenomic systems biology beyond simple 
comparative studies treats the entire metagenome as a 
single system and analyzes the changes in metabolic 
networks inferred from topological models of healthy 
and diseased metagenomes in IBD and obesity [72]. The 
ultimate challenge will be to model the interactions of 
the microbiome community with the host. 

Functional annotation is, as pointed earlier, a fundamen- 
tal substrate of systems biology. Functional annotation 
provides a priori knowledge, interaction constraints and 
an interpretative framework for systems biology (Figure 2). 
More effective methods for functional annotation are 
necessary to leverage further genomic data for system- 
level analyses. For example, nearly half the genes of higher 
eukaryotes are proteins of unknown function or non-cod- 
ing genes that await functional characterization. High 
throughput screening methods for protein-coding genes 
[73,74] or computational predictive approaches in the case 



of non-coding transcripts [75] might help to speed up 
functional characterization but are still at their infancy 
and far from being generally applicable. Moreover, the 
systems-oriented analysis of gene expression still has 
much to evolve both methodologically and conceptually. 
For example, pathway methods rely on existing annotation 
data that points to which genes are involved with specific 
cellular roles, but most annotation databases are static and 
do not incorporate tissue or development specific informa- 
tion. Moreover, the assignment of genes to functions is 
still a largely unfinished task and the boundaries of path- 
way definitions are arbitrary: one database might include a 
set of genes within a specific signaling pathway while 
another would split this into two separate pathways. The 
best way to reconcile different pathway views and to cap- 
ture the plasticity of signaling and metabolomic pathways 
is still an open question in genomics research. Addition- 
ally, functional enrichment methods typically consider all 
genes in the gene set as equally contributing to the func- 
tional capacity of the set, thereby ignoring the stronger 
regulatory role of some pathway components and hence 
their differential impact on the pathway functionality. 
Relevant pathway genes could be identified by their net- 
work properties as it is done in systems medicine [68] or 
by being highly regulated in Pathway Network Analysis 
[76]. This strategy is predicated on the concept of driving 
genes that account for most of the variability in the 
coordinated expression of the pathway and are major con- 
tributors to changes in pathway activity [76]. However, 
there is still a need for accurate systematic approaches to 
dissect the differential relevance of genes within pathways. 
As gene expression analysis continues the transition to 
high-throughput sequencing, transcripts rather than genes 
become the fundamental feature measured and will 
require the update of functional profiling methods to sup- 
port transcript-level functional analyses of enrichment. In 
particular, the analysis of the functional consequences of 
alternative-splicing within a systems framework, such as 
the analysis of the exonic targets of the neuronal splicing 
factor NOVA1 [77], remains a relatively unexplored area 
that seems destined for advances with more accurate tran- 
script reconstruction methods from RNA-seq data. Inter- 
estingly, as system biologists and bioinformaticians build 
widely used tools for pathway analysis and differential 
gene expression, their end users do not necessarily con- 
sider themselves to be doing systems biology explicitly, 
even though they publish system-level analyses in their 
publications. For example, the pathway analysis tool 
PARADIGM is designed to find pathway-level changes in 
cancers using graphical models that clearly fall within 
the scope of systems biology [78], yet its users in TCGA 
(such as in [68]) do not claim to take explicit systems 
approaches. We take this to be a sign of the success of 
system-level analyses. 
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Although genomics and systems biology have started 
to reshape a multitude of areas in biology, new subfields 
evolve with ever more specialization. Ironically, we 
suspect that systems biology and genomics are in fact 
contributing to a new era of specialization by creating 
entire new subfields such as developmental systems 
biology or pharmacogenomics. However, as genomics 
continues to expand and to mature by addressing nearly 
every imaginable biological question, it is increasingly 
clear that the primary analysis of the resulting data 
alone is no longer sufficient for extracting new biologi- 
cal insights. Instead, we need to leverage the ideas 
and techniques of systems biology to understand the 
behavior of the system and its multitude of parts. Simi- 
larly, the challenges that genomics is now tackling by 
integratively analyzing ever higher-dimensional, multi- 
species systems will likely require the development of 
more sophisticated hierarchical models by the systems 
biology community to enable meaningful joint compara- 
tive analyses. Last but not least, genomics will need to 
leverage systems biology by building predictive models 
from personal genome data to produce actionable 
results for patient care that delivers on the promise of 
precision medicine. Thus there is much more work to 
be done jointly by genome and systems biologists. 
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